Combining Heterogeneous Knowledge Resources for Improved Distributional Semantic Models
نویسندگان
چکیده
The Explicit Semantic Analysis (ESA) model based on term cooccurrences in Wikipedia has been regarded as state-of-the-art semantic relatedness measure in the recent years. We provide an analysis of the important parameters of ESA using datasets in five different languages. Additionally, we propose the use of ESA with multiple lexical semantic resources thus exploiting multiple evidence of term cooccurrence to improve over the Wikipedia-based measure. Exploiting the improved robustness and coverage of the proposed combination, we report improved performance over single resources in word semantic relatedness, solving word choice problems, classification of semantic relations between nominals, and text similarity.
منابع مشابه
A Framework for Enriching Lexical Semantic Resources with Distributional Semantics
We present an approach to combining distributional semantic representations induced from text corpora with manually constructed lexical-semantic networks. While both kinds of semantic resources are available with high lexical coverage, our aligned resource combines the domain specificity and availability of contextual information from distributional models with the conciseness and high quality ...
متن کاملTowards a Distributional Semantic Web Stack
The capacity of distributional semantic models (DSMs) to discover similarities over large scale heterogeneous and poorly structured data brings them as a promising universal and low-effort framework to support semantic approximation and knowledge discovery. This position paper explores the role of distributional semantics in the Semantic Web vision, based on state-of-the-art distributional-rela...
متن کاملBuilding Semantic Networks from Plain Text and Wikipedia with Application to Semantic Relatedness and Noun Compound Paraphrasing
The construction of suitable and scalable representations of semantic knowledge is a core challenge in Semantic Computing. Manually created resources such as WordNet have been shown to be useful for many AI and NLP tasks, but they are inherently restricted in their coverage and scalability. In addition, they have been challenged by simple distributional models on very large corpora, questioning...
متن کاملSynonym extraction and abbreviation expansion with ensembles of semantic spaces
BACKGROUND Terminologies that account for variation in language use by linking synonyms and abbreviations to their corresponding concept are important enablers of high-quality information extraction from medical texts. Due to the use of specialized sub-languages in the medical domain, manual construction of semantic resources that accurately reflect language use is both costly and challenging, ...
متن کاملTowards an Approximative Ontology-Agnostic Approach for Logic Programs
Distributional semantics focuses on the automatic construction of a semantic model based on the statistical distribution of colocated words in large-scale texts. Deductive reasoning is a fundamental component for semantic understanding. Despite the generality and expressivity of logical models, from an applied perspective, deductive reasoners are dependent on highly consistent conceptual models...
متن کامل